The actual infection counts are estimated from fatality data instead of the more biased (or even meaningless) positive test data. The only input to the model is the time to death distribution - the distribution of the time between infection and death. The approach is simple; using the time to death distribution in reverse, each death is randomly assigned to an infection day. The number of people which were estimated to be infected on that date (adjusted for censoring) multiplied by the infection fatality rate (IFR) is an estimate of the true infection count on that day. Adjusting for censoring means that a correction is made to account for people that were infected on previous day, but not enough time has elapsed to know if they will survive.
Based on [REFs], I am using a shifted negative binomial (mean=23.9, size=7.9) distribution for time to death (given death from COVID-19):
Equation here
The number of deaths on each day are randomly distributed to previous days based on the probability that infection took place on that day. The figure below illustrates how this works. For example, in the United States there were 3010 deaths recorded on Apr 14. This number shows up on the top right in the row corresponding to Apr 14. These 3010 deaths are then distributed to previous days according to the time-to-death distribution. For example, we can see from the PMF plot above that about 4.46% of the deaths of any day will be assigned 20 days in the past. Thus, we would expect about 134 deaths to be assigned to Mar 25 which is 20 days before Apr 14. The figure shows results for one simulation which assigned 113 113 deaths to Mar 25. This indicates that 113 of the people who died on Apr 14 were estimated to be infected on Mar 25.
Likewise for Apr 13, 74 of the 1988 people who died were estimated to have been infected on Mar 25. Adding all the numbers in the Mar 25 column, we find that a total of 853 of the people who have died were estimated to have been infected on Mar 25.
Before we can use this number to estimate the infection fatality rate (IFR) we have to make an adjustment to account for the people that were infected on Mar 25 but not enough time has elapsed to know if they will survive. Referring to the CDF (cumulative distribution function) plot above, we expect 39.9% of the deaths from COVID-19 to occur within in the first 20 days from infection. This implies that the 853 imputed infections only represents about 39.9% of the infections that we can expect to be attributed to Mar 25 once more time elapses. Dividing 853 by 39.9% to adjust for future deaths, we get an estimate of 2135.70 for the number of people infected on Mar 25 who will unfortunately succumb to COVID-19.
Taking that number and dividing by the estimated IFR provides an estimate of the total number of people infected on Mar 25:| fatality rate | 3.0% | 2.0% | 1.0% | 0.5% |
| number infected | 71,190 | 106,785 | 213,570 | 427,140 |
Repeating this procedure for every day will give the estimated infection rates over time. Estimated infection counts at days close to the current day will have a high level of uncertainty because there is limited fataility data available. To help control the erratic behavior, the estimates are adjust slightly to encourage the log of the estimated infection counts to be linear. This has no pratical effect on the estimates more than about 10 days from the current date.
The estimated infection count plots on the main page also include uncertainty bands (confidence intervals) formed by repeating this procedure 1000 times and shading in the 95% pointwise intervals (i.e., using the .025 and .975 quantiles).
TODO
TODO Details of the team